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Abstract 



This article describes an evaluation I conducted of courses I was teaching part-time by 
comparing the results of a test found in a journal which I gave at three prominent 
universities in the Sapporo region. I changed the instruction of the courses I evaluated to 
make them more similar communicatively to the content of the courses I was giving at the 
other universities. Qualitatively, the instrument, although developed to investigate Li 
pro-drop parameter influence by the original author, shows promise for validation as a 
measure of proficiency where other testing may produce less exacting scores. The 
article is intended for readers interested in communicative course development, testing, 
and course evaluation at Japanese universities. 




OftlC6 01 CQUCailUHiji noa«5oi'.-i> «iiu — 301 

EDUCATIONAL RESOURCES INFORMATION 






/ CENTER (ERIC) 

This document has been reproduced as 
received from the person or organization 
originating it. 



□ Minor changes have been made to 
improve reproduction quality. 




resources 

information center (ERIC) 



* Points of view or opinions stated in this 
document do not necessarily represent 
official OERI position or policy. 







2 



BEST COPY AVAILABLE 



u- 



Issues in Course Development, Evaluation, and Testing: 
A Case Study from Japan 



Lome O. Kirkwold 

Hokkai-Gakuen University, Sapporo, Japan 

This article describes an evaluation I conducted of courses I was teaching part-time by 
comparing the results of a test found in a journal which I gave at three prominent 
universities in the Sapporo region. I changed the instruction of the courses I evaluated to 
make them more similar communicatively to the content of the courses I was giving at the 
other universities. Qualitatively, the instrument, although developed to investigate Li 
pro-drop parameter influence by the original author, shows promise for validation as a 
measure of proficiency where other testing may produce less exacting scores. The 
article is intended for readers interested in communicative course development, testing, 
and course evaluation at Japanese universities. 



Introduction 

There are various approaches to course development. A traditional means of 
elaborating students’ needs is to frame objectives linguistically. This can involve the 
comparison of linguistic structures in the two languages and evaluating the difficulty of the 
structures to be practiced in the foreign language. Those structures that are most 
different between the two languages will presumably require the most attention. Lado’s 
(1 957) work is one of the best known discussing such an analysis for developing 
objectives. In more recent curriculum innovations, objectives are framed around 
communicative tasks, sometimes these tasks are the results of a needs analysis. Nunan 
(1 989) describes communicative tasks aimed at developing fluency. Long (1 985) 
considers how such tasks can be graded. 



Course development case study 

Often when we have decisions about program development, we have a chance 
to exchange our ideas with our colleagues. Indeed, this has been my experience at 
Hokkai-Gakuen University (HGU), my full-time job, where we have developed a regular 
stream of skill-based courses in Reading, Writing, Speaking, and Listening for students 
to pursue during their first two years of studies in our department of English language 
and culture. In these same skill areas are parallel courses in the seminar stream, ideally 
for students who demonstrate greater aptitude. This brings me to the problem which I 
would like to describe as the focus for this paper. In addition to my regular job at HGU, 



a private institution in Sapporo, I also have two part-time jobs outside: the first at 
Hokkaido University (HU), the city’s national university: the second at Otaru University of 
Commerce (OUC), also a national university within commuting distance. In Japan, a 
foreign-language course at a university is given during a ninety-minute period with 
meetings scheduled once a week. Depending on the institution, there will be 12 to 15 
classes per semester. It is usual for classes to meet a full year for a total of 24 to 30 
classes, but some language courses run only a semester. At my regular job, I have 
worked closely with my colleagues in determining what the objectives should be in the 
courses we teach. These are reflected in the descriptions we have written jointly for the 
language-skill courses of a year’s length and the department’s list of textbooks which we 
have chosen together. At my part-time job at HU, two of the colleagues in my usual 
department are employed at the same time. We collaborate together to determine 
what we are going to teach in English (Eigo) I. This is in addition to objectives and 
requirements set in English by HU faculty members. It is a general one-semester 
course, whose broad objective is conversational speaking, in which we have had 
enrolled day students from education, law, medicine, dentistry, engineering, agriculture, 
and veterinary. Going to my second part-time job, although I may see other colleagues 
and “talk shop,’’ I have not found the same amount of formal exchange about the 
objectives for English courses at OUC. My assignment there is English I in the daytime 
and English II for the night students. These courses run the full year. 

I began my paper by mentioning the alternatives for the development of a 
language program of either linguistic or communicative objectives. As for my selection 
on my own at OUC, I chose the textbook Clear Speech. Readers interested in the 
course design are referred to the Teacher’s Resource Book (Gilbert, 1993). It is a 
compromise I see between these poles, and I use this book for both courses there. 
The text written at an earlier time may have been critiqued as an audio-lingual method 
by Rivers (1964). Certainly there is opportunity for repetition using the accompanying 
recording. Dialogues are also modeled tightly. In the early chapters, dictation exercises 
are the focus of listening practice. Gilbert does not prescribe memorization though. The 
content of the course is a systematic presentation of the problematic sounds of the 
language (for example, voiced and unvoiced th, bAr, the plural -s, and -ed) in addition 
to a revision of stress, rhythm, linking and contractions. These features are generally 
explained and modeled by the teacher or the program on the tape. After this, there 



may be some choral repetitions. Then students practice the features they have revised 
in pair work, essentially short questions and answers. The material seems to assume 
some previous study of the spoken features of the language, and in this regard, the 
textbook accommodates a large number of students readily. I suspect they have 
studied English in similar ways in high school. 

This is the position from which I started, but I was more satisfied with the greater 
focus on the communicative aspect of language learning both at HU and HGU. For 
example, in speaking courses at these universities, we use Marathon Mouth and 
Marathon Mouth Plus. Readers interested in the course design are referred respectively 
to Marathon MouthTeacher’s Edition (Koustaff, Gaston, & Shimizu, 2000) and Marathon 
Mouth Plus Teacher’s Edition (Shimizu & Gaston, 2000). The pair work in a chapter of 
this textbook series typically consists of information gap activities thematically 
developed. Aftenwards, the focus of the recordings is often listening for specific 
information. The interview activity which comes near the end of the chapter allows 
students to exchange information of personal interest among themselves and with the 
teacher. These exchanges can be cultural. In Speaking Seminar 2 at HGU the year I 
gathered data for this study, I used Amazing! Interviews and Conversations (Bates, 

1 993). The recorded material consists of interviews, albeit scripted, developed around 
human interest stories in Canadian newspapers. Not only are there comprehension 
exercises focusing on specific details and note-taking activities, the material also offers 
ample opportunity for instruction about Canadian culture. In contrast, teaching my 
lessons at OUC, I felt the instruction was “hot and cold.” I was not certain either as to the 
level of students relative to those at the other two institutions, where I had been teaching 
for a longer period of time. I decided at the beginning of the school year in 2001 that I 
would administer the same test to all classes at the three institutions in order to compare 
levels of proficiency. 



The instrument and results 

The instrument I chose was a test developed for other purposes in the universal 
grammar (UG) literature. What aroused my curiosity was testing the pro-drop 
parameter in a Japanese context. White (1985) used the instrument for speculating 
about the nature of pro-drop transfer of Spanish speakers learning English at McGill 
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University in Montreal, Canada.' The test consists of thirty-one items, each requiring a 
grammaticaiity judgment. Six test missing subjects. There are 13 Wh- questions. 
Interspersed among the questions are four sentences with affirmative intent whose 
subjects and verbs are ungrammaticaiiy inversed (presumabiy following aiiowable 
Spanish order) and a fifth with styiistic inversion foiiowing an existentiai “there.” Six 
items, of which five are exampies of reiative ciauses, test subordination. Finaiiy, there is 
one exampie of the SVC structure. A student takes the test by indicating grammaticaiity 
in the foiiowing way. Marking OK for a given item indicates the student beiieves the 
sentence is correct, in the case of incorrect sentences, the student is intended to correct 
the mistake(s) and copy the corrected version on the line beiow. in order to score a 
point, a student must indicate OK appropriateiy or, in the case of an incorrect sentence, a 
suitabie correction must be provided. There are no partial points. Aithough there are 31 
items on the originai test, when i scored the test myself, I excluded 6 items. A perfect 
score would therefore be 25 in my results. This is the greatest modification I made in the 
administration of the test compared to White’s description. I administered the test to: the 
English I classes at HU on April 11, 2001 and October 3, 2001 ; and the English I and II 
classes at OUC on October 4, 2001 . HGU students were tested between April 1 2 
and 17, 2001 , with the exception of Speaking Seminar 2 students, who were tested 
November 14, 2001. Students were in attendance with only a few exceptions; 
generally the absentees took the test at other times. The results follow below in rank 
order: 



Score 


U 


Course 


n 


Yr 


D/N 


Major 


Text 


16.69 


HU 


English 1 (64) 


32 


1 


D 


Agriculture & Veterinary 


MM-h 


16.56 


HGU 


Writing Seminar 3 


9 


3 


D 


English 




16.10 


HU 


English 1 (35) 


30 


1 


D 


Medical / Dental 


MM 


14.93 


HU 


English 1 (43) 


29 


1 


D 


Engineering 


MM+ 


14.69 


HGU 


Speaking Seminar 2 


13 


3 


N 


English 


Amazing! 


14.22 


OUC 


English 1 (E-133B) 


32 


1 


D 


Commerce 


CS 


14.17 


HU 


English 1 ( 9) 


29 


1 


D 


Education / Law 


MM 


13.38 


OUC 


English II (E-23C) 


45 


2 


N 


Commerce 


CS 


12.88 


HGU 


Speaking 2 


24 


2 


D 


English 


MM-h 


12.86 


HGU 


Speaking/Listening Sem 


14 


1 


D 


Japanese 


MM 




' I extend my thanks to Professor White for allowing me to use her instrument. 



12.29 HGU 



17 



2 



N 



Writing Seminar I 



12.10 


HGU 


Reading 1 


29 


1 


11.68 


HGU 


Listening 1 


22 


1 


1 1 .59 


HGU 


Reading 1 


22 


1 



English 
D English 

D English 

N English 



Beside each score is given the university, course (with section number indicated in 
parentheses), and number of students, along with information about the students’ year, 
time at which they study (D for ‘day’ and N for ‘night’), and major. Lines containing OUC 
results are highlighted in bold font. For the purpose of making comparisons among the 
speaking courses, I have indicated textbooks: MM for Marathon Mouth, MM+ for 
Marathon Mouth Plus, C S for Clear Speech, Amazing Itor Amazing! Interviews and 
conversations. 



Discussion 

Describing my ranking qualitatively is the first step towards the validation of the 
test for this purpose. Students at Hokkaido University occupy the upper ranks. There 
are two third-year Hokkai-Gakuen seminars for English majors that rank among the 
Hokkaido University results. The OUC day students rank one place above the lowest 
results for Hokkaido University; the OUC night students, one place below. The second- 
year English majors at HGU follow. There are a day class and a night class, but with the 
first-year Japanese majors’ results falling between. In other words, second-year 
daytime HGU English majors placed higher than did first-year daytime Japanese 
Studies students, who placed higher than second-year nighttime English majors. The 
last three results are those of first-year HGU English majors, with the night students 
ranking the lowest. 

In brief, my interpretation of these results is that the two courses at OUC are both 
in the middle third, in terms of proficiency, of my teaching assignments. I can use these 
data to make a case for the desirability of switching the focus in these classes. We 
continued to use the textbook, but for approximately 30 minutes each class. In the 
remaining hour, I pursued activities which I believed to challenge the students more in 
terms of their “communicative competence” (Canale & Swain, 1 980). Such activities 
included watching videos developed for EFL instruction; for example. Laugh and Learn 
with Mr. Bean (Hamada & Akimoto, 2001), Lost Secret 2000 {O'NelW, 1996) and 
Family Album, USA (Kelty, 1 991 ). The authors of these materials have their own 



design; however, I used these videos for their culture and storyline in the following way. 
We would view an entire episode at one time. I would then direct a structured 
production task. Students were to volunteer one or two sentences about what they had 
seen. I would write these sentences on the board, and then students would do a writing 
assignment based on the viewing of one (or more) of these episodes. My satisfaction 
with the change was such that I taught the entire 2002 - 2003 year in this way. At the 
end of that course, three students wrote these comments in English in an evaluation 
whose results were returned to me by OUC: 

“Video assignment is very nice for me. I wanna be more good listener of 
English, so this class is good for me. It’s important for me to listen and speak English. 
This is much for me.” 

“I enjoyed this class. To watch many videoes is good.” 

“This class is very interesting. The teacher used video much. The teacher 
speaks only English.” 

Evaluation of my study and directions for further research 

An exercise such as this one may be useful for others who are in a similar 
situation. Many foreign-language teachers in Japan teach part-time but likely have little 
time to collaborate with colleagues. I have found giving a similar test to students at three 
different universities a helpful means of assessing an appropriate level in my teaching or 
confirming that I am on target. The comparison I have presented has allowed me to 
make decisions with more confidence about the nature of the material I can present to a 
certain group of students. What I believe to be novel about my research is that I have 
found in the literature an instrument suitable for analysis while conducting my own 
evaluation. Researchers develop instruments but it is rare to see discussion of their 
suitability by others as articles in language learning journals. Discussion of validity may 
well be abundant, but it is usually the same researcher who ascertains the suitability of 
his or her instrument. I believe disseminating a developing instrument through the 
journals will enhance its validity by increasing reliability. Colleagues in various places can 
confirm that the instrument is providing useful assessments by presenting data such as 
those I have gathered. To this end, a relatively objective test, like the one I have used, 
may ensure a certain level of scoring consistency. The bands of communicative tests 
may include descriptors which are not apt to be interpreted in the same way by native- 



speaking and non-native speaking coiieagues. Looking for a suitabie means for 
assessment, we may consider the need for a “finer” instrument required in situations iike 
my own where the reiative homogeneity of the students wouid otherwise result in 
piacing the majority in the same proficiency band. 

At the same time, there is a body of iiterature associated with instruments 
described in journais. This may provide insight to the ciassroom teacher who is trying to 
understand how students may be iearning and why particuiar points are probiematic. in 
UG iiterature, the description of parameters may assist the teacher in understanding the 
deveiopment of non-mother tongue iearning. 

Caveats based on this experience 

A journai article will not completely describe ali steps required in the 
deveiopment of the instrument, in order to assist our coiieagues, we shouid document 
with expianation amendments we make. For exampie, I had to add directions and 
exampies to the instrument before i administered it. i aiso tried out the test myself in 
order to determine how I would score the results. Then I discovered that some famiiiarity 
with the students’ responses before scoring is desirabie in determining aiiowabie correct 
answers. Consider item #5: ‘The poiiceman didn’t know when did escape the 
prisoner.” The expected correction is ‘The policemen didn’t know when the prisoner 
escaped,” and ‘The poiicemen didn’t know when the prisoner did escape” is acceptable 
too. On the other hand, the students’ correction, “The poiicemen didn’t know when the 
prisoner had escaped” requires some judgment for the scoring. I have decided not to 
aiiow it because it is not the simpiest correction. Aithough i administered the 31 items as 
they appear in the reference, I came to exciude from scoring the six items as numbered 
in the appendix (White, 1985, p. 62) which foiiow: #4, 1 1 , 20, 23, 24, and 29. As for 
#4, White mentions herself that the pronoun “if in this item couid be omitted in spoken 
English. This makes scoring the item ambiguous. Simiiariy, #1 1 , with its existentiai 
‘there,” seems piausible for students famiiiar with poetry or literature. White, however, 
intended this item as ungrammaticai. Simiiar discussion couid be written about the other 
items i have exciuded from scoring, in these instances, native-speaker comparisons 
would be useful. The breakdown may prove interesting: some would be acceptable 
to a certain number of native speakers. In such cases, the explanation may be reiated 
to variation in diaiects. In this train of thought, answer keys annotated with aiiowabie 



answers and discussion even in the article itself about how the wrong answers came to 
be excluded may facilitate the development of a testing instrument in various places. 
Explanation shared in this way may be of interest to readers wishing to experiment with 
the instrument. 
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